Simple Network Visualisation with R

There are a number of user friendly tools for visualizing networks out there which don’t require any programming knowledge. These include Cytoscape and Gephi, among others. The language R, however, has become a popular and powerful platform for data analysis, as well as the cleaning of data, visualization of texts, networks, and geographical information. R benefits from a large ecosystem of open source packages, and in recent years, a collection of them that have come to be known as the Tidyverse has made the process of exploring data significantly easier. On the network analysis front, mature packages like statnet and igraph are joined by new ones including the pair tidygraph and ggraph to make it fairly easy to visualize and explore networks in R.

This tutorial was designed for history students in a masters level skills module at the University of St Andrews MLitt programme in Global, Transnational, and Spatial history to get a first taste of how R might be used to explore historical networks. In this exercise we will practice creating some simple network visualisations using a fictional network of East Asian gangsters and revolutionaries.

Prerequisites: My students working with this tutorial R Notebook have done a little bit of previous work with R and text analysis with material from Text Mining with R by Julia Silge & David Robinson, Text Analysis with R for Students of Literature by Matthew Jockers, and read some of A User’s Guide to Network Analysis in R on igraph as well has having completed the DataCamp module Introduction to the Tidyverse. I would suggest trying this tutorial if you have had a least some basic introduction to R and familiarity with RStudio.

This website was created as an R Notebook which can be used if you have R installed and open the file in RStudio. You can download the files used in this tutorial here in the github repository. If you open this notebook in RStudio, you will see the code and can run all of it in one go with cmd/ctrl-option-r. Alternatively, you can run code from a single section using cmd/ctrl-shift-enter. Many of the questions below ask you to see what happens when you tweak some of the code found here.

For this exercise you need the packages:

readr dplyr fircats tibble ggplot2 ggraph tidygraph igraph

In RStudio “Install Packages…” from the Tools menu and you can paste in the above list of packages separated as they are by a space and press Install. After the packages are installed (you may have had some of them already), then load them as follows:

Importing the Data

Great. Now we need to get our data into our network. Put the nodes.csv and edges.csv files that I have shared with you into the working directory where you should be keeping this notebook file (or set the working directory to the write place in the Session menu).

We will now load the nodes into a nodes data frame and edges into an edges data frame. The “head” command with 10 as a parameter will give you a peak at the contents of each file.

person location age nationality mentions discuss gender
Tomohiko Tokyo 22 Japan 14 1 m
Kyŏngmin Seoul 55 Korea 12 0 m
Jiurong Shanghai 44 China 3 0 f
Sangok Pusan 33 Korea 5 0 m
Yoshinobu Tokyo 66 Japan 67 1 m
Wei Qingdao 57 China 30 1 f
Songbae Seoul 36 Korea 26 0 m
Minjun Pusan 55 Korea 4 0 m
Hayun Pusan 22 Korea 2 0 m
Minjae Pusan 30 Korea 12 0 m
Hyŏnu Pusan 21 Korea 7 0 m
Hyejin Pusan 18 Korea 9 0 f
Sunja Pusan 21 Korea 1 0 f
Chŏngsu Taegu 20 Korea 22 1 m
Yŏngsik Taegu 35 Korea 1 0 m
Pyŏngho Taegu 31 Korea 3 0 m
Guiying Harbin 28 China 6 1 f
Xiuying Harbin 24 China 14 0 f
Guoran Dalian 22 China 7 1 m
Yŏngsu Tokyo 31 Korea 34 0 f
from to kind intensity year_start year_end
Chŏngsu Minjae 3 3 1907 1921
Hayun Minjae 3 1 1902 1943
Jiurong Tomohiko 3 1 1896 1947
Kyŏngmin Jiurong 3 1 1895 1920
Minjae Chŏngsu 3 3 1907 1921
Takamasa Kei 3 2 1898 1934
Tomohiko Jiurong 3 1 1910 1936
Wei Guoran 3 3 1872 1920
Yoshinobu Tomohiko 3 2 1898 1915
Chŏngsu Yŏngsik 2 1 1919 1931

Notice that the nodes have age, nationality, location, mentions (let us say this is number of times they appear in some source or collection of sources). I have also an arbitrary binary discuss column where I have manually flagged up a few important characters I might want to emphasise.

When preparing a collection of nodes and edges for network visualization it is usually best to have a column in the nodes table with unique id numbers that are used as a reference key to all other information about that agent. Then, in the edges table, you would see only the relevant id numbers, instead of the names. However, for this simple example, to increase the readability of the files as we learn the basics, I have chosen to use the given name the fictional individuals (there are just one or two real given names that fit the description of individuals for this network to add to the fun for East Asian historians) without any special id column.

Creating a Simple Network

Let us create a network object from our nodes and edges:

This creates an igraph network object, but it is a format that is easily understood by ggraph and most of its features. Later in this exercise we will convert this to a tidygraph tibble graph. For now, we can very easily create a simple graph diagram using the ggraph() command. It works in a very similar fashion to ggplot, which it is an expansion of. You tell it the network to use, assign a layout type, then add options. In this case we will simply add a geom_edge_link() which will give us the edges, and a geom_node_point() which will display points.

This is very simple. We can see that it is placed on an x,y axis and looks like a kind of special ggplot diagram. There is lots of things we might want to do to improve this.

Adding Labels

Let us start by adding labels to the graph. Under geom_node_point() we will add a geom_node_label(). The aesthetics we will give it are to connect its label to the name column of our nodes, set the font face.

Then, back outside the aesthetics aes() we will set the transparency level to 60% (alpha=0.6). This may seem like somethign we would put inside the aesthetic, but because we are giving it a specific value, and not mapping it to our data, it goes outside. This will allow us to see any edge lines and nodes behind the label.

We also add the repel = TRUE here to help with the formatting of the location of the labels.

Questions 1

Try the following questions below.

  1. What happens if you remove the repel=TRUE (remember to cut out the trailing comma too)?
  2. How would you show the age instead of the name?
  3. What would you do if you wanted remove the nodes altogether and just use labels instead of nodes? Try this without transparency and removing the repel feature.

Adding a Theme

Both ggplot and ggraph can work with “themes” that store lots of custom settings that we can apply to our graphs. You can store a theme in a function that calls the theme and then add that theme function to any graph you call. You might, for example, create several to match different purposes. See the R for Data Science book, or ggplot: Elegant Graphics for Data Analysis or the DataCamp class Communicating with Data in R (Tidyverse) or just run ?theme for more on themes.

Let us create a theme to use for our graphs. They will make the background a light grey, extend the margins, remove the axis text, ticks, and titles. It will also remove all the grid lines.

We need only call network_theme() at the end of our graphs to apply these settings.

Now in our next graph, notice the changes caused by our theme and no other changes:

Adding Labels and More

Now let us begin adding some more things to our graph. Using the labs() function, we can add a title, a caption on the source of the data at the bottom right. Also in labs() we can rename the legend titles. Notice I use the escaped n character in one case to create a two line legend header. Notice that, in the case of the edge width, I had to use an “edge_” prefix before naming the legend header.

We’ll also make some other additions to our graph diagram. In the geom_edge_link() aesthetics, we will tell it to vary the width of the edgbe by the intensity column of our data. Then outside the aesthetics we will fix the color of the lines to a mid level grey.

We can control the scale of the width with the scale_edge_width() function, which sets the range to a minimum of 0.2 in width and a maximum of 2, scaling the numbers to something within that range.

For our nodes, our aes() now scales the size of the node by the number of mentions in the sources, and the color of the nodes according to the nationality.

Questions 2

  1. How would you change the code so that the transparency varied according to mentions?
  2. How would you vary the color by location rather than nationality?

Creating a Subgraph by Filtering on an Edge attribute

One of the variables we have that we haven’t used is the kind column in the node data, which is a number from 1-3. What if we wanted to create a second diagram that only shows those relationships which are of kind 3?

The tidygraph library has a nice activate() method that allows you to manipulate nodes and edges or filter them in various ways. Instead of calling activate(edges) before manipulating edges, there is also a nice shortcut, with %E>% instead of the usual pipe or %N>% to work with your nodes. For this we need to take our igraph network and convert it to a tbl_graph with as_tbl_graph() and then we can use the filter() command to find just the edges which have a kind==3. If graphed this immediately, we would see the filtered edges, but also a number of isolated nodes no longer connected to the rest of the graph. We can activate the node layer and then filter out the isolated nodes with filter(!node_is_isolated()).

Questions 4

  1. Go back and change the filter to look only for edges of kind 2, then again for kind 1.
  2. Instead of filtering by kind, create a graph diagram of only the people based in Tokyo, or only the Koreans in the network.
  3. How would you create a subgraph showing only those whose relationship year_start was before 1890 and year_end after 1910? You can do this with two filter commands, or with a compound & statement.

Community Detection

Network scientists have developed a variety of algorithms to detect communities in a network. While the analytical value of this algorithmically derived grouping in the context of historical research may be limited, for larger networks, it can help you identify clusters to explore. For more on this read the chapter on “Subgroups” in the book A User’s Guide to Network Analysis in R. The tidygraph package inherits many of the community detection algorithms imbedded into igraph and makes them available to us, including Edge-betweenness (group_edge_betweenness), Leading eigenvector (group_leading_eigen), Fast-greedy (group_fast_greedy), Louvain (group_louvain), Walktrap (group_walktrap), Label propagation (group_label_prop), InfoMAP (group_infomap), Spinglass (group_spinglass), and Optimal (group_optimal). Some community algorithms are designed to take into account direction or weight, while others ignore it. Below we try Walktrap, which is not, in fact, designed for directed networks, but try comparing its results with other community detection algorithms and note the differences.

Questions 5

  1. Had you done so manually, would you have divided up the graph into “communities” along these lines? Which assignments by the algorithm look out of place to you?
  2. Try the other community detection algorithms and compare the results.

Bimodal Networks

Bimodal, bipartite, or affiliation networks have two different types of nodes and generally only link between the two types of nodes. As the term “affiliation network” suggests, this is often in the form of the affiliation of an individual to an organisation of some kind.

Let us import a list of edges between individuals and organisations.

From To
Tomohiko Toilers of the Great East
Jiurong Green Crane Society
Minjun Workers Alliance
Hyejin East Wind
Yoshinobu Kawakami-gumi
Wei Great Harmony Society
Wei Green Crane Society
Hyejin Toilers of the Great East
Kyŏngmin Toilers of the Great East
Sangok East Wind
Songbae Toilers of the Great East
Hayun Toilers of the Great East
Minjae Toilers of the Great East
Hyŏnu Toilers of the Great East
Hyejin Workers Alliance
Sunja Toilers of the Great East
Chŏngsu Workers Alliance
Yŏngsu Red Wave Association
Chŏngsu Toilers of the Great East
Yŏngsik Toilers of the Great East
Pyŏngho Toilers of the Great East
Guiying Green Crane Society
Xiuying Great Harmony Society
Guoran Great Harmony Society
Guoran East Wind
Xiuying East Wind
Minjun East Wind
Takamasa Kawakami-gumi
Masahirō Kawakami-gumi
Kei Iwaguchi-gumi
Senjūrō Iwaguchi-gumi
Michiō Iwaguchi-gumi
Yōsuke Kawakami-gumi
Yoshinobu Toilers of the Great East
Kikue Red Wave Association
Kanno Red Wave Association
Fumiko Red Wave Association
Fumiko Toilers of the Great East
Zhen Toilers of the Great East
Jongmyung Workers Alliance
Zhen Great Harmony Society
Masahirō Iwaguchi-gumi

We have now a table with relationships between indivdiuals and organisations, but it would be nice to create a merged node table which joins all the attribute information from organisations, which includes the location of the organisations’ headquarters, and all the attribute data for individuals. We can use full_join() for this.

person location age nationality mentions discuss gender HQ
Tomohiko Tokyo 22 Japan 14 1 m NA
Kyŏngmin Seoul 55 Korea 12 0 m NA
Jiurong Shanghai 44 China 3 0 f NA
Sangok Pusan 33 Korea 5 0 m NA
Yoshinobu Tokyo 66 Japan 67 1 m NA
Wei Qingdao 57 China 30 1 f NA
Songbae Seoul 36 Korea 26 0 m NA
Minjun Pusan 55 Korea 4 0 m NA
Hayun Pusan 22 Korea 2 0 m NA
Minjae Pusan 30 Korea 12 0 m NA
Hyŏnu Pusan 21 Korea 7 0 m NA
Hyejin Pusan 18 Korea 9 0 f NA
Sunja Pusan 21 Korea 1 0 f NA
Chŏngsu Taegu 20 Korea 22 1 m NA
Yŏngsik Taegu 35 Korea 1 0 m NA
Pyŏngho Taegu 31 Korea 3 0 m NA
Guiying Harbin 28 China 6 1 f NA
Xiuying Harbin 24 China 14 0 f NA
Guoran Dalian 22 China 7 1 m NA
Yŏngsu Tokyo 31 Korea 34 0 f NA
Yōsuke Nagoya 26 Japan 10 0 m NA
Kei Osaka 24 Japan 3 0 m NA
Senjūrō Kagoshima 35 Japan 1 0 m NA
Masahirō Kagoshima 41 Japan 1 0 m NA
Takamasa Kōchi 45 Japan 4 0 m NA
Michiō Niigata 37 Japan 1 0 m NA
Kanno Osaka 32 Japan 44 1 f NA
Fumiko Seoul 29 Japan 31 1 f NA
Kikue Tokyo 40 Japan 14 0 f NA
Zhen Yizheng 30 China 29 1 f NA
Jongmyung Seoul 23 Korea 10 0 f NA
Toilers of the Great East NA NA NA NA NA NA Pusan
Green Crane Society NA NA NA NA NA NA Beijing
Workers Alliance NA NA NA NA NA NA Seoul
East Wind NA NA NA NA NA NA Shanghai
Kawakami-gumi NA NA NA NA NA NA Tokyo
Iwaguchi-gumi NA NA NA NA NA NA Kagoshima
Great Harmony Society NA NA NA NA NA NA Beijing
Red Wave Association NA NA NA NA NA NA Tokyo

Now we can create a network object from this merged information. In order to keep track of what nodes are part of each mode (individuals or organisations) we’ll add a type column to the node data that will get a TRUE value if it is one of the organisations.

Now we can great a graph diagram of our bimodal network. In the code, I have made a few customisations to our usual graphs above by setting the shape of the node to correspond to whether it is an individual or an organisation and then chose a circle (ggplot shape number 19) or a square (15). I increased the fig_width to make the chart wider, and used some conditionals in the form of ifelse() to conditionally distinguish the organisations by color, and only assign labels to individuals.

Note: If you run this code in R Studio, note the difference between the appearance of the plots within R Studio and the exported web page version.

You can also use a special bipartite layout for the graph that produces a hierarchical look. Sometimes the tree layout will also produce a desirable effect as well.

Bimodal graphs are nice for visualising the connections between two different types of things. As Scott Weingart has argued in several web posts, including his overview of bimodal networks, they are significantly more difficult to analysis using formal network analysis methods, including the challenge of exploring various forms of centrality or clustering coefficients.

They are valuable, however, as a heuristic visualisation to explore your network and discover new questions, or areas to focus in on for more research. They can also serve more simple illustrative purposes when you are exploring a historical network in your narrative and want to illustrate visually relationships between individuals and organisations or some other combination of two modes even without formal analysis being carried out.

One useful transformation of your bimodal newtorks that can be particularly useful, especially for larger networks than the one we are dealing with here, is to explore connections between the nodes in one mode or the other by means of their connections to the other mode. In our historical example, we might explore what the connectivity is between organisations based on members who tie them together, or, what connections are there between individuals by virtue of the fact that they share membership in an organisation. These are called projections of bimodal networks.

To create these projections we can use the igraph function bipartite.projection() function. This will create a list with two projections proj1 and proj2, one for each mode. Let us assign each one to its own network object and then plot them.

The lines here are thicker in the cases where members were more linked to each other by mutual membership in multiple organisations. In the second plot we see that four of the organisations each share two members. Not terribly revealing in this case, but with much larger networks, this may reveal interlocking organisations with overlapping memberships that might not be immediately obvious by perusing a table of membership data.

One Plot to Rule Them All

Bimodal networks include only connections between two different modes. But there is nothing preventing you from flattening a bimodal graph and including all the edges from our unimodal network. That is, you can create a visualisation, for illustrative or heuristic purposes, that depicts both relationships between individuals and between these individuals and the organisations. Please note that if formal analysis plays any role in your exploration of these networks, this is not methodologically sound for any number of reasons. Among the issues is that we are mixing a directed network (of individuals) with an undirected network (of affiliations).

To create our mega plot, we will merge the edge table with relationships between individuals and organisations using bind_rows(), with that of individuals to individuals. For simplicity, we will first assign an intensity of 1 and type 4 to all affiliation relationships, and leave all date info as NA. We’ll also standardise the naming of the columns as “From” and “To” are capitalised in one case and not in the other. mutate() makes it easy to rename the columns.

We can then visualise all the edges together, and use various visual features to help make the plot more readable, but anyone who has used software such as Cytoscape, for example, will see that it is much easier to customise the visualisation of multiple networks together there than here, as far as I have been able to determine. Especially if the aim is just to explore your data as a part of the research and thinking process, then Cytoscape is a much easier alternative to R and igraph/ggraph.

from to kind intensity year_start year_end
Chŏngsu Minjae 3 3 1907 1921
Hayun Minjae 3 1 1902 1943
Jiurong Tomohiko 3 1 1896 1947
Kyŏngmin Jiurong 3 1 1895 1920
Minjae Chŏngsu 3 3 1907 1921
Takamasa Kei 3 2 1898 1934
Tomohiko Jiurong 3 1 1910 1936
Wei Guoran 3 3 1872 1920
Yoshinobu Tomohiko 3 2 1898 1915
Chŏngsu Yŏngsik 2 1 1919 1931
Chŏngsu Pyŏngho 2 1 1901 1939
Guiying Guoran 2 5 1905 1944
Hayun Hyejin 2 1 1885 1940
Hyejin Minjae 2 3 1885 1930
Hyejin Sangok 2 1 1887 1947
Jiurong Yoshinobu 2 1 1911 1928
Kyŏngmin Tomohiko 2 1 1901 1925
Masahirō Takamasa 2 1 1910 1918
Minjun Hyŏnu 2 2 1891 1932
Minjun Hyejin 2 2 1883 1923
Wei Xiuying 2 3 1881 1943
Wei Guiying 2 5 1886 1935
Yoshinobu Yōsuke 2 3 1888 1910
Kanno Kei 3 2 1870 1914
Kei Kanno 3 2 1870 1914
Yoshinobu Kei 2 2 1902 1920
Songbae Guoran 3 3 1892 1913
Yoshinobu Michiō 2 4 1865 1945
Yoshinobu Masahirō 2 5 1890 1951
Chŏngsu Sunja 1 2 1906 1927
Guiying Xiuying 1 4 1888 1945
Jiurong Wei 1 2 1886 1932
Kyŏngmin Yoshinobu 1 2 1920 1940
Masahirō Senjūrō 1 4 1902 1910
Minjun Hayun 1 1 1899 1927
Minjun Minjae 1 1 1895 1926
Minjun Yŏngsu 1 1 1909 1925
Senjūrō Masahirō 1 4 1899 1938
Sunja Chŏngsu 1 2 1906 1927
Tomohiko Jiurong 1 1 1918 1924
Tomohiko Wei 1 1 1914 1925
Tomohiko Songbae 1 2 1901 1932
Tomohiko Yoshinobu 1 1 1903 1926
Wei Jiurong 1 2 1886 1932
Xiuying Guiying 1 4 1888 1945
Yoshinobu Takamasa 1 3 1870 1933
Kanno Takamasa 3 3 1918 1938
Fumiko Yōsuke 3 2 1870 1922
Fumiko Hayun 3 4 1901 1930
Fumiko Hyejin 2 1 1870 1938
Kikue Kei 2 2 1910 1970
Zhen Guoran 3 3 1870 1899
Zhen Jiurong 3 5 1918 1900
Jongmyung Sunja 3 2 1870 1938
Jongmyung Yŏngsik 2 3 1882 1910
Pyŏngho Jongmyung 3 3 1870 1938
Sangok Fumiko 1 5 1890 1920
Kikue Fumiko 3 2 1900 1910
Xiuying Fumiko 3 3 1880 1910
Tomohiko Toilers of the Great East 4 1 NA NA
Jiurong Green Crane Society 4 1 NA NA
Minjun Workers Alliance 4 1 NA NA
Hyejin East Wind 4 1 NA NA
Yoshinobu Kawakami-gumi 4 1 NA NA
Wei Great Harmony Society 4 1 NA NA
Wei Green Crane Society 4 1 NA NA
Hyejin Toilers of the Great East 4 1 NA NA
Kyŏngmin Toilers of the Great East 4 1 NA NA
Sangok East Wind 4 1 NA NA
Songbae Toilers of the Great East 4 1 NA NA
Hayun Toilers of the Great East 4 1 NA NA
Minjae Toilers of the Great East 4 1 NA NA
Hyŏnu Toilers of the Great East 4 1 NA NA
Hyejin Workers Alliance 4 1 NA NA
Sunja Toilers of the Great East 4 1 NA NA
Chŏngsu Workers Alliance 4 1 NA NA
Yŏngsu Red Wave Association 4 1 NA NA
Chŏngsu Toilers of the Great East 4 1 NA NA
Yŏngsik Toilers of the Great East 4 1 NA NA
Pyŏngho Toilers of the Great East 4 1 NA NA
Guiying Green Crane Society 4 1 NA NA
Xiuying Great Harmony Society 4 1 NA NA
Guoran Great Harmony Society 4 1 NA NA
Guoran East Wind 4 1 NA NA
Xiuying East Wind 4 1 NA NA
Minjun East Wind 4 1 NA NA
Takamasa Kawakami-gumi 4 1 NA NA
Masahirō Kawakami-gumi 4 1 NA NA
Kei Iwaguchi-gumi 4 1 NA NA
Senjūrō Iwaguchi-gumi 4 1 NA NA
Michiō Iwaguchi-gumi 4 1 NA NA
Yōsuke Kawakami-gumi 4 1 NA NA
Yoshinobu Toilers of the Great East 4 1 NA NA
Kikue Red Wave Association 4 1 NA NA
Kanno Red Wave Association 4 1 NA NA
Fumiko Red Wave Association 4 1 NA NA
Fumiko Toilers of the Great East 4 1 NA NA
Zhen Toilers of the Great East 4 1 NA NA
Jongmyung Workers Alliance 4 1 NA NA
Zhen Great Harmony Society 4 1 NA NA
Masahirō Iwaguchi-gumi 4 1 NA NA

Now let us create a new network object with this merged edge table and our previously merged node table and plot the results:

## Warning: Using size for a discrete variable is not advised.

Other Layouts

Up until now we have been mostly using the Kamada-Kawai layout algorithm to determine the look of our network. There are a range of the other layouts you can create with the replacement of the layout type.

Below see our graph with the Fruchterman-Reingold layout.

There is also a “circular” layout, which takes a bit more tweaking of the parameters and size to get it to fit well:

Questions 6 Playing with the Layouts

  1. Try replacing the layout="" to the following possible layouts: sugiyama,star,dh,gem,graphopt,drl and compare the results.
  2. Why did I add the fig.width=5 option used in the case of the circular layout in the declration of the r code section. What happens if you remove it?
  3. Why did I hard code the font size , size=2.2 outside of the aes() for the geom_node_label()? What happens if you cut that out?
  4. What happens if you add another ggplot option (don’t forget the + on the end of the previous line!) with coord_cartesian(xlim=c(-1.5,1.5),ylim=c(-1.5,1.5))
  5. How could I colour the labels by the nationality of the nodes? By the location of the members?
  6. How could I set it so that the size of the labels changes according to the age of the members of the network?
  7. How could I limit the range of the size of the fonts from sizes 2 to 4?

Adding Some Network Analysis

Although we have been using the ggraph package to visualise our network, the graph itself is an igraph object and can take advantage of all the analytical tools in igraph:

Look how easy it is to add columns, using our trusty dplyr mutate() to add columes with the betweenness, closeness, and eigenvector centrality computed for our nodes, together with the total, in, and out degrees.

Note: If your graph is in tidygraph you can also use the wide variety of centrality_ prefixed functions.

We can do a quick comparison of in, out and total degree of the nodes, which measures the outgoing and incoming relationships, or their total, minus any overlapping edges. Notice I used the fct_reorder() function from the forcats library to re-sort the names by their total degree (degree_all). Comment out that line to see what happens to the graph.

With this data we could easily plot the relationship between various kinds of centrality. Betweenness centrality is a measure of the degree to which a node is a gatekeeper to other nodes. How many of the shortest paths between nodes must pass through a given node? Eigenvector centrality tries to judge the importance of a node by the relative connectivity of its neighbors. Read more about it here. Let us compare the two in our own network:

How about the relationship bewteen Eigenvector centrality and another measure, closeness centrality. Closeness centrality is a measure of how close a given node is to all the other nodes.

Now that we have all this information, we can also now redo our network graph using any of these measures. Let us get a network graph that incorporates all the new variables we had added to the node table:

For example, here is a graph diagram with the size of the node changed to indicate its betweenness.

Questions 7

  1. How would you change this to colour by location, but size by closeness centrality? Or eigenvector centrality?
  2. How would you create a ggplot that showed the relationship between betweenness centrality and the mentions in the sources?
  3. Challenge: How would you create a ggplot that visualized the comparison of the average betweenness of women in the network compared to men?
  4. Challenge: What steps would you need to go through to compare the total density (ratio of the number of the edges vs. possible edges) of the members of the network in each of the three nationalities? What about in each location? How could you plot this in a simple bar graph? You may have to do some exploring in the documentation for igraph or ggraph

This should give you a good start at creating network graph diagrams using R. See some of these resources for more:

Books

Luke, Douglas A. A User’s Guide to Network Analysis in R. 1st ed. 2015 edition. Cham Hildesheim New York: Springer, 2015.

Wickham, Hadley. ggplot2: Elegant Graphics for Data Analysis. 2nd ed. 2016 edition. New York, NY: Springer, 2016.

Scott, John. Social Network Analysis. 3rd ed., 2013.

Scott, John, and Peter J. Carrington, eds. The SAGE Handbook of Social Network Analysis. London ; Thousand Oaks, Calif: SAGE, 2011.

Wasserman, Stanley, and Katherine Faust. Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.

This R Notebook was written with the help of various books and tutorials mentioned above, but mostly thanks to 40-60 google searches, with the answers found generally on the websites above, Stack Overflow, and obscure online bulletin boards.